Search CORE

22 research outputs found

Sketching as a Tool for Efficient Networked Systems

Author: Liu Zaoxing
Publication venue: 'The Busan Gyeongnam Mathematical Society'
Publication date: 15/04/2019
Field of study

Today, computer systems need to cope with the explosive growth of data in the world. For instance, in data-center networks, monitoring systems are used to measure traffic statistics at high speed; and in financial technology companies, distributed processing systems are deployed to support graph analytics. To fulfill the requirements of handling such large datasets, we build efficient networked systems in a distributed manner most of the time. Ideally, we expect the systems to meet service-level objectives (SLOs) using the least amount of resource. However, existing systems constructed with conventional in-memory algorithms face the following challenges: (1) excessive resource requirements (e.g., CPU, ASIC, and memory) with high cost; (2) infeasibility in a larger scale; (3) processing the data too slowly to meet the objectives. To address these challenges, we propose sketching techniques as a tool to build more efficient networked systems. Sketching algorithms aim to process the data with one or several passes in an online, streaming fashion (e.g., a stream of network packets), and compute highly accurate results. With sketching, we only maintain a compact summary of the entire data and provide theoretical guarantees on error bounds. This dissertation argues for a sketching based design for large-scale networked systems, and demonstrates the benefits in three application contexts: (i) Network monitoring: we build generic monitoring frameworks that support a range of applications on both software and hardware with universal sketches. (ii) Graph pattern mining: we develop a swift, approximate graph pattern miner that scales to very large graphs by leveraging graph sketching techniques. (iii) Halo finding in N-body simulations: we design scalable halo finders on CPU and GPU by leveraging sketch-based heavy hitter algorithms

JScholarship

HeteroSketch: coordinating network-wide monitoring in heterogeneous and dynamic networks

Author: Agarwal Anup
Liu Zaoxing
Seshan Srinivasan
Publication venue
Publication date: 18/02/2023
Field of study

CNS-2107086 - National Science Foundation; CNS-2106946 - National Science FoundationPublished versio

Boston University Institutional Repository (OpenBU)

SketchLib: enabling efficient sketch-based monitoring on programmable switches

Author: Kim Daehyeok
Liu Zaoxing
Namkung Hun
Sekar Vyas
Steenkiste Peter
Publication venue
Publication date: 18/02/2023
Field of study

CNS-2107086 - National Science Foundation; CNS-2106946 - National Science FoundationPublished versio

Boston University Institutional Repository (OpenBU)

Enabling Efficient and General Subpopulation Analytics in Multidimensional Data Streams

Author: Ben Basat Ran
Cheng Zhuo
Liu Zaoxing
Manousis Antonis
Sekar Vyas
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/07/2022
Field of study

Today’s large-scale services (e.g., video streaming platforms, data centers, sensor grids) need diverse real-time summary statistics across multiple subpopulations of multidimensional datasets. However, state-of-the-art frameworks do not offer general and accurate analytics in real time at reasonable costs. The root cause is the combinatorial explosion of data subpopulations and the diversity of summary statistics we need to monitor simultaneously. We present Hydra, an efficient framework for multidimensional analytics that presents a novel combination of using a “sketch of sketches” to avoid the overhead of monitoring exponentially-many subpopulations and universal sketching to ensure accurate estimates for multiple statistics. We build Hydra as an Apache Spark plugin and address practical system challenges to minimize overheads at scale. Across multiple real-world and synthetic multidimensional datasets, we show that Hydra can achieve robust error bounds and is an order of magnitude more efficient in terms of operational cost and memory footprint than existing frameworks (e.g., Spark, Druid) while ensuring interactive estimation times

UCL Discovery